Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 874
Filtrar
1.
Cell Rep ; 38(7): 110364, 2022 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-35172134

RESUMO

Mesendodermal specification is one of the earliest events in embryogenesis, where cells first acquire distinct identities. Cell differentiation is a highly regulated process that involves the function of numerous transcription factors (TFs) and signaling molecules, which can be described with gene regulatory networks (GRNs). Cell differentiation GRNs are difficult to build because existing mechanistic methods are low throughput, and high-throughput methods tend to be non-mechanistic. Additionally, integrating highly dimensional data composed of more than two data types is challenging. Here, we use linked self-organizing maps to combine chromatin immunoprecipitation sequencing (ChIP-seq)/ATAC-seq with temporal, spatial, and perturbation RNA sequencing (RNA-seq) data from Xenopus tropicalis mesendoderm development to build a high-resolution genome scale mechanistic GRN. We recover both known and previously unsuspected TF-DNA/TF-TF interactions validated through reporter assays. Our analysis provides insights into transcriptional regulation of early cell fate decisions and provides a general approach to building GRNs using highly dimensional multi-omic datasets.


Assuntos
Endoderma/embriologia , Redes Reguladoras de Genes , Genômica , Mesoderma/embriologia , Xenopus/embriologia , Xenopus/genética , Animais , Cromatina/metabolismo , Sequência Consenso/genética , DNA/metabolismo , Gastrulação/genética , Regulação da Expressão Gênica no Desenvolvimento , Ligação Proteica , RNA/metabolismo , Fatores de Transcrição/metabolismo , Transcrição Gênica
2.
EBioMedicine ; 75: 103750, 2022 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-34922323

RESUMO

BACKGROUND: Long non-coding RNAs (lncRNAs) have recently emerged as essential biomarkers of cancer progression. However, studies are limited regarding lncRNAs correlated with recurrence and fluorouracil-based adjuvant chemotherapy (ACT) in stage II/III colorectal cancer (CRC). METHODS: 1640 stage II/III CRC patients were enrolled from 15 independent datasets and a clinical in-house cohort. 10 prevalent machine learning algorithms were collected and then combined into 76 combinations. 109 published transcriptome signatures were also retrieved. qRT-PCR assay was performed to verify our model. FINDINGS: We comprehensively identified 27 stably recurrence-related lncRNAs from multi-center cohorts. According to these lncRNAs, a consensus machine learning-derived lncRNA signature (CMDLncS) that exhibited best power for predicting recurrence risk was determined from 76 kinds of algorithm combinations. A high CMDLncS indicated unfavorable recurrence and mortality rates. CMDLncS not only could work independently of common clinical traits (e.g., AJCC stage) and molecular features (e.g., microsatellite state, KRAS mutation), but also presented dramatically better performance than these variables. qRT-PCR results from 173 patients further verified our in-silico findings and assessed its feasible in different centers. Comparisons of CMDLncS with 109 published transcriptome signatures further demonstrated its predictive superiority. Additionally, patients with high CMDLncS benefited more from fluorouracil-based ACT and were characterized by activation of stromal and epithelial-mesenchymal transition, while patients with low CMDLncS suggested the sensitivity to bevacizumab and displayed enhanced immune activation. INTERPRETATION: CMDLncS provides an attractive platform for identifying patient at high risk of recurrence and could optimize precision treatment to improve the clinical outcomes in stage II/III CRC. FUNDING: This study was supported by the National Natural Science Foundation of China (81,972,663); Henan Province Young and Middle-Aged Health Science and Technology Innovation Talent Project (YXKC2020037); and Henan Provincial Health Commission Joint Youth Project (SB201902014).


Assuntos
Neoplasias Colorretais , Sequência Consenso , RNA Longo não Codificante , Adolescente , Biomarcadores Tumorais/genética , Neoplasias Colorretais/tratamento farmacológico , Neoplasias Colorretais/genética , Sequência Consenso/genética , Humanos , Aprendizado de Máquina , Pessoa de Meia-Idade , Recidiva Local de Neoplasia/genética , Prognóstico , RNA Longo não Codificante/genética
3.
Nat Protoc ; 16(7): 3625-3638, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-34089018

RESUMO

The most common nonstandard nucleotides found in genomic DNA are ribonucleotides. Although ribonucleotides are frequently incorporated into DNA by replicative DNA polymerases, very little is known about the distribution and signatures of ribonucleotides incorporated into DNA. Recent advances in high-throughput ribonucleotide sequencing can capture the exact locations of ribonucleotides in genomic DNA. Ribose-Map is a user-friendly, standardized bioinformatics toolkit for the comprehensive analysis of ribonucleotide sequencing experiments. It allows researchers to map the locations of ribonucleotides in DNA to single-nucleotide resolution and identify biological signatures of ribonucleotide incorporation. In addition, it can be applied to data generated using any currently available high-throughput ribonucleotide sequencing technique, thus standardizing the analysis of ribonucleotide sequencing experiments and allowing direct comparisons of results. This protocol describes in detail how to use Ribose-Map to analyze ribonucleotide sequencing data, including preparing the reads for analysis, locating the genomic coordinates of ribonucleotides, exploring the genome-wide distribution of ribonucleotides, determining the nucleotide sequence context of ribonucleotides and identifying hotspots of ribonucleotide incorporation. Ribose-Map does not require background knowledge of ribonucleotide sequencing analysis and assumes only basic command-line skills. The protocol requires less than 3 h of computing time for most datasets and ~30 min of hands-on time. Ribose-Map is available at https://github.com/agombolay/ribose-map .


Assuntos
DNA Fúngico/genética , Genoma , Genômica/métodos , Ribonucleotídeos/genética , Ribose/metabolismo , Saccharomyces cerevisiae/genética , Sequência de Bases , Sequência Consenso/genética , DNA Mitocondrial/genética
4.
Mol Biol Rep ; 48(3): 2223-2233, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-33689093

RESUMO

TEOSINTE BRANCHED 1/CYCLOIDEA/PROLIFERATING CELL FACTOR 1 (TCP) transcription factors control multiple aspects of growth and development in various plant species. However, few genes were reported to be directly targeted and regulated by them through their specific binding sites, and then uncover their functions in plants. A consensus DNA-binding site motif of TCP2 was identified by random binding site selection (RBSS). DNA recognized by TCP2 contained the motif G(G/T)GGNCC(A/C), which showed high consistency with motifs bound by other TCP domain proteins. Consequently, this motif was regarded as the specific DNA-binding sites of TCP2. Circadian clock associated 1 (CCA1) and EARLY FLOWERING 3 (ELF3) were subsequently considered as potential target genes owing to the containing of the similar TCP2 binding sites or core binding sites GGNCC and found to be positively regulated by TCP2 via DNA binding. Phenotype analysis results showed that mutation and over-expression of TCP2 resulted in variations in leaf morphogenesis, especially the double or triple mutations of TCP2, 4 and 10. Mutations in TCPs caused late flowering. Finally, TCP2 was shown to influence hypocotyl elongation by mediating the jasmonate signaling pathway. Overall, these results provide a basis for future studies aimed at distinguishing the target genes of TCP2 and elucidating the important roles of TCP2 in plant growth and development.


Assuntos
Proteínas de Arabidopsis/metabolismo , Arabidopsis/crescimento & desenvolvimento , Arabidopsis/genética , Sítios de Ligação/genética , Sequência Consenso/genética , DNA de Plantas/metabolismo , Fatores de Transcrição/metabolismo , Proteínas de Arabidopsis/química , Proteínas de Arabidopsis/genética , Sequência de Bases , Ciclopentanos/metabolismo , Flores/fisiologia , Regulação da Expressão Gênica de Plantas , Hipocótilo/crescimento & desenvolvimento , Morfogênese/genética , Mutação/genética , Oxilipinas/metabolismo , Folhas de Planta/crescimento & desenvolvimento , Ligação Proteica , Domínios Proteicos , Transdução de Sinais , Fatores de Tempo , Fatores de Transcrição/química , Fatores de Transcrição/genética
5.
Arch Virol ; 166(1): 43-64, 2021 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-33052487

RESUMO

Leucine-rich repeats (LRRs) are present in over 563,000 proteins from viruses to eukaryotes. LRRs repeat in tandem and have been classified into fifteen classes in which the repeat unit lengths range from 20 to 29 residues. Most LRR proteins are involved in protein-protein or ligand interactions. The amount of genome sequence data from viruses is increasing rapidly, and although viral LRR proteins have been identified, a comprehensive sequence analysis has not yet been done, and their structures, functions, and evolution are still unknown. In the present study, we characterized viral LRRs by sequence analysis and identified over 600 LRR proteins from 89 virus species. Most of these proteins were from double-stranded DNA (dsDNA) viruses, including nucleocytoplasmic large dsDNA viruses (NCLDVs). We found that the repeating unit lengths of 11 types are one to five residues shorter than those of the seven known corresponding LRR classes. The repeating units of six types are 19 residues long and are thus the shortest among all LRRs. In addition, two of the LRR types are unique and have not been observed in bacteria, archae or eukaryotes. Conserved strongly hydrophobic residues such as Leu, Val or Ile in the consensus sequences are replaced by Cys with high frequency. Phylogenetic analysis indicated that horizontal gene transfer of some viral LRR genes had occurred between the virus and its host. We suggest that the shortening might contribute to the survival strategy of viruses. The present findings provide a new perspective on the origin and evolution of LRRs.


Assuntos
DNA/genética , Leucina/genética , Sequências Repetitivas de Aminoácidos/genética , Vírus/genética , Archaea/virologia , Bactérias/virologia , Sequência Consenso/genética , Eucariotos/virologia , Filogenia , Proteínas Virais/genética
6.
Nat Commun ; 11(1): 6023, 2020 11 26.
Artigo em Inglês | MEDLINE | ID: mdl-33243970

RESUMO

The success of protein evolution campaigns is strongly dependent on the sequence context in which mutations are introduced, stemming from pervasive non-additive interactions between a protein's amino acids ('intra-gene epistasis'). Our limited understanding of such epistasis hinders the correct prediction of the functional contributions and adaptive potential of mutations. Here we present a straightforward unique molecular identifier (UMI)-linked consensus sequencing workflow (UMIC-seq) that simplifies mapping of evolutionary trajectories based on full-length sequences. Attaching UMIs to gene variants allows accurate consensus generation for closely related genes with nanopore sequencing. We exemplify the utility of this approach by reconstructing the artificial phylogeny emerging in three rounds of directed evolution of an amine dehydrogenase biocatalyst via ultrahigh throughput droplet screening. Uniquely, we are able to identify lineages and their founding variant, as well as non-additive interactions between mutations within a full gene showing sign epistasis. Access to deep and accurate long reads will facilitate prediction of key beneficial mutations and adaptive potential based on in silico analysis of large sequence datasets.


Assuntos
Evolução Molecular Direcionada , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Ensaios de Triagem em Larga Escala/métodos , Oxirredutases atuantes sobre Doadores de Grupo CH-NH/genética , Engenharia de Proteínas/métodos , Biocatálise , Clonagem Molecular , Biologia Computacional/métodos , Sequência Consenso/genética , Conjuntos de Dados como Assunto , Ensaios Enzimáticos , Epistasia Genética , Biblioteca Gênica , Mutagênese , Mutação , Oxirredutases atuantes sobre Doadores de Grupo CH-NH/isolamento & purificação , Oxirredutases atuantes sobre Doadores de Grupo CH-NH/metabolismo , Filogenia , Proteínas Recombinantes/genética , Proteínas Recombinantes/isolamento & purificação , Proteínas Recombinantes/metabolismo , Software
7.
Front Cell Infect Microbiol ; 10: 575613, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-33123498

RESUMO

Background: The ongoing pandemic of SARS-COV-2 has already infected more than eight million people worldwide. The majority of COVID-19 patients either are asymptomatic or have mild symptoms. Yet, about 15% of the cases experience severe complications and require intensive care. Factors determining disease severity are not yet fully characterized. Aim: Here, we investigated the within-host virus diversity in COVID-19 patients with different clinical manifestations. Methods: We compared SARS-COV-2 genetic diversity in 19 mild and 27 severe cases. Viral RNA was extracted from nasopharyngeal samples and sequenced using the Illumina MiSeq platform. This was followed by deep-sequencing analyses of SARS-CoV-2 genomes at both consensus and sub-consensus sequence levels. Results: Consensus sequences of all viruses were very similar, showing more than 99.8% sequence identity regardless of the disease severity. However, the sub-consensus analysis revealed significant differences in within-host diversity between mild and severe cases. Patients with severe symptoms exhibited a significantly (p-value 0.001) higher number of variants in coding and non-coding regions compared to mild cases. Analysis also revealed higher prevalence of some variants among severe cases. Most importantly, severe cases exhibited significantly higher within-host diversity (mean = 13) compared to mild cases (mean = 6). Further, higher within-host diversity was observed in patients above the age of 60 compared to the younger age group. Conclusion: These observations provided evidence that within-host diversity might play a role in the development of severe disease outcomes in COVID-19 patients; however, further investigations are required to elucidate this association.


Assuntos
Betacoronavirus/classificação , Betacoronavirus/genética , Variação Genética/genética , Genoma Viral/genética , Índice de Gravidade de Doença , Adulto , Idoso , COVID-19 , Sequência Consenso/genética , Infecções por Coronavirus/patologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Pandemias , Pneumonia Viral/patologia , RNA Viral/genética , Fatores de Risco , SARS-CoV-2 , Análise de Sequência de RNA , Adulto Jovem
8.
Curr Biol ; 30(22): 4454-4466.e5, 2020 11 16.
Artigo em Inglês | MEDLINE | ID: mdl-32976810

RESUMO

Many protein-modifying enzymes recognize their substrates via docking motifs, but the range of functionally permissible motif sequences is often poorly defined. During eukaryotic cell division, cyclin-specific docking motifs help cyclin-dependent kinases (CDKs) phosphorylate different substrates at different stages, thus enforcing a temporally ordered series of events. In budding yeast, CDK substrates with Leu/Pro-rich (LP) docking motifs are recognized by Cln1/2 cyclins in late G1 phase, yet the key sequence features of these motifs were unknown. Here, we comprehensively analyze LP motif requirements in vivo by combining a competitive growth assay with deep mutational scanning. We quantified the effect of all single-residue replacements in five different LP motifs by using six distinct G1 cyclins from diverse fungi including medical and agricultural pathogens. The results uncover substantial tolerance for deviations from the consensus sequence, plus requirements at some positions that are contingent on the favorability of other motif residues. They also reveal the basis for variations in functional potency among wild-type motifs, and allow derivation of a quantitative matrix that predicts the strength of other candidate motif sequences. Finally, we find that variation in docking motif potency can advance or delay the time at which CDK substrate phosphorylation occurs, and thereby control the temporal ordering of cell cycle regulation. The overall results provide a general method for surveying viable docking motif sequences and quantifying their potency in vivo, and they reveal how variations in docking strength can tune the degree and timing of regulatory modifications.


Assuntos
Quinases Ciclina-Dependentes/metabolismo , Ciclinas/genética , Fase G1 , Domínios Proteicos/genética , Proteínas de Saccharomyces cerevisiae/genética , Motivos de Aminoácidos/genética , Sequência Consenso/genética , Ciclinas/metabolismo , Análise Mutacional de DNA , DNA Fúngico/genética , DNA Fúngico/isolamento & purificação , Fosforilação/genética , Ligação Proteica/genética , Saccharomyces cerevisiae , Proteínas de Saccharomyces cerevisiae/metabolismo
9.
Nature ; 585(7825): 459-463, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32908305

RESUMO

The RNA polymerase II (Pol II) core promoter is the strategic site of convergence of the signals that lead to the initiation of DNA transcription1-5, but the downstream core promoter in humans has been difficult to understand1-3. Here we analyse the human Pol II core promoter and use machine learning to generate predictive models for the downstream core promoter region (DPR) and the TATA box. We developed a method termed HARPE (high-throughput analysis of randomized promoter elements) to create hundreds of thousands of DPR (or TATA box) variants, each with known transcriptional strength. We then analysed the HARPE data by support vector regression (SVR) to provide comprehensive models for the sequence motifs, and found that the SVR-based approach is more effective than a consensus-based method for predicting transcriptional activity. These results show that the DPR is a functionally important core promoter element that is widely used in human promoters. Notably, there appears to be a duality between the DPR and the TATA box, as many promoters contain one or the other element. More broadly, these findings show that functional DNA motifs can be identified by machine learning analysis of a comprehensive set of sequence variants.


Assuntos
Sequência Consenso/genética , Regulação da Expressão Gênica/genética , Regiões Promotoras Genéticas/genética , RNA Polimerase II/metabolismo , Máquina de Vetores de Suporte , Transcrição Gênica , Sequência de Bases , Células/metabolismo , Simulação por Computador , Conjuntos de Dados como Assunto , Células HeLa , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Modelos Genéticos , Mutagênese , TATA Box/genética
10.
Nat Commun ; 11(1): 3224, 2020 06 26.
Artigo em Inglês | MEDLINE | ID: mdl-32591528

RESUMO

In plants, epigenetic regulation is critical for silencing transposons and maintaining proper gene expression. However, its impact on the genome-wide transcription initiation landscape remains elusive. By conducting a genome-wide analysis of transcription start sites (TSSs) using cap analysis of gene expression (CAGE) sequencing, we show that thousands of TSSs are exclusively activated in various epigenetic mutants of Arabidopsis thaliana and referred to as cryptic TSSs. Many have not been identified in previous studies, of which up to 65% are contributed by transposons. They possess similar genetic features to regular TSSs and their activation is strongly associated with the ectopic recruitment of RNAPII machinery. The activation of cryptic TSSs significantly alters transcription of nearby TSSs, including those of genes important for development and stress responses. Our study, therefore, sheds light on the role of epigenetic regulation in maintaining proper gene functions in plants by suppressing transcription from cryptic TSSs.


Assuntos
Arabidopsis/genética , Epigênese Genética , Regulação da Expressão Gênica de Plantas , Transcrição Gênica , Sequência de Bases , Sequência Consenso/genética , Metilação de DNA/genética , DNA Polimerase beta/metabolismo , Elementos de DNA Transponíveis/genética , Genes de Plantas , Mutação/genética , RNA Polimerase II/metabolismo , Sítio de Iniciação de Transcrição , Transcriptoma/genética
11.
Biotechnol Lett ; 42(8): 1305-1315, 2020 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-32430802

RESUMO

Multiple sequence alignment (MSA) is a fundamental way to gain information that cannot be obtained from the analysis of any individual sequence included in the alignment. It provides ways to investigate the relationship between sequence and function from a perspective of evolution. Thus, the MSA of proteins can be employed as a reference for protein engineering. In this paper, we reviewed the recent advances to highlight how protein engineering was benefited from the MSA of proteins. These methods include (1) engineering the thermostability or solubility of proteins by making it closer to the consensus sequence of the alignment through introducing site mutations; (2) structure-based engineering proteins with comparative modeling; (3) creating paleoenzymes featured with high thermostability and promiscuity by constructing the ancestral sequences derived from multiple sequence alignment; and (4) incorporating site-mutations targeting the evolutionarily coupled sites identified from multiple sequence alignment.


Assuntos
Engenharia de Proteínas/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos/genética , Sequência Consenso/genética , Mutação/genética , Estabilidade Proteica , Proteínas/química , Proteínas/genética , Proteínas/metabolismo
12.
J Vis Exp ; (157)2020 03 11.
Artigo em Inglês | MEDLINE | ID: mdl-32225162

RESUMO

Whole genome sequencing can be used to characterize and to trace viral outbreaks. Nanopore-based whole genome sequencing protocols have been described for several different viruses. These approaches utilize an overlapping amplicon-based approach which can be used to target a specific virus or group of genetically related viruses. In addition to confirmation of the virus presence, sequencing can be used for genomic epidemiology studies, to track viruses and unravel origins, reservoirs and modes of transmission. For such applications, it is crucial to understand possible effects of the error rate associated with the platform used. Routine application in clinical and public health settings require that this is documented with every important change in the protocol. Previously, a protocol for whole genome Usutu virus sequencing on the nanopore sequencing platform was validated (R9.4 flowcell) by direct comparison to Illumina sequencing. Here, we describe the method used to determine the required read coverage, using the comparison between the R10 flow cell and Illumina sequencing as an example.


Assuntos
Flavivirus/genética , Genoma Viral , Sequenciamento por Nanoporos , Sequenciamento Completo do Genoma , Sequência Consenso/genética , Primers do DNA/metabolismo , Análise de Dados , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Reação em Cadeia da Polimerase , Padrões de Referência
13.
Sci Rep ; 10(1): 6727, 2020 04 21.
Artigo em Inglês | MEDLINE | ID: mdl-32317695

RESUMO

The biology of bacterial cells is, in general, based on information encoded on circular chromosomes. Regulation of chromosome replication is an essential process that mostly takes place at the origin of replication (oriC), a locus unique per chromosome. Identification of high numbers of oriC is a prerequisite for systematic studies that could lead to insights into oriC functioning as well as the identification of novel drug targets for antibiotic development. Current methods for identifying oriC sequences rely on chromosome-wide nucleotide disparities and are therefore limited to fully sequenced genomes, leaving a large number of genomic fragments unstudied. Here, we present gammaBOriS (Gammaproteobacterial oriC Searcher), which identifies oriC sequences on gammaproteobacterial chromosomal fragments. It does so by employing motif-based machine learning methods. Using gammaBOriS, we created BOriS DB, which currently contains 25,827 gammaproteobacterial oriC sequences from 1,217 species, thus making it the largest available database for oriC sequences to date. Furthermore, we present gammaBOriTax, a machine-learning based approach for taxonomic classification of oriC sequences, which was trained on the sequences in BOriS DB. Finally, we extracted the motifs relevant for identification and classification decisions of the models. Our results suggest that machine learning sequence classification approaches can offer great support in functional motif identification.


Assuntos
Gammaproteobacteria/classificação , Gammaproteobacteria/genética , Aprendizado de Máquina , Motivos de Nucleotídeos/genética , Origem de Replicação/genética , Software , Sequência de Bases , Sequência Consenso/genética , Modelos Genéticos , Filogenia
14.
PLoS One ; 15(4): e0229315, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32320410

RESUMO

Mutations in the splicing machinery have been implicated in a number of human diseases. Most notably, the U2 small nuclear ribonucleoprotein (snRNP) component SF3b1 has been found to be frequently mutated in blood cancers such as myelodysplastic syndromes (MDS). SF3b1 is a highly conserved HEAT repeat (HR)-containing protein and most of these blood cancer mutations cluster in a hot spot located in HR4-8. Recently, a second mutational hotspot has been identified in SF3b1 located in HR9-12 and is associated with acute myeloid leukemias, bladder urothelial carcinomas, and uterine corpus endometrial carcinomas. The consequences of these mutations on SF3b1 functions during splicing have not yet been tested. We incorporated the corresponding mutations into the yeast homolog of SF3b1 and tested their impact on splicing. We find that all of these HR9-12 mutations can support splicing in yeast, and this suggests that none of them are loss of function alleles in humans. The Hsh155V502F mutation alters splicing of several pre-mRNA reporters containing weak branch sites as well as a genetic interaction with Prp2 and physical interactions with Prp5 and Prp3. The ability of a single allele of Hsh155 to perturb interactions with multiple factors functioning at different stages of the splicing reaction suggests that some SF3b1-mutant disease phenotypes may have a complex origin on the spliceosome.


Assuntos
Mutação/genética , Fosfoproteínas/genética , Precursores de RNA/genética , Fatores de Processamento de RNA/genética , Splicing de RNA/genética , Sequências Repetitivas de Aminoácidos , Ribonucleoproteína Nuclear Pequena U2/genética , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Sequência de Aminoácidos , Sequência Consenso/genética , Epistasia Genética , Humanos , Fosfoproteínas/química , Ligação Proteica , Fatores de Processamento de RNA/química , Ribonucleoproteína Nuclear Pequena U2/química , Saccharomyces cerevisiae/crescimento & desenvolvimento , Proteínas de Saccharomyces cerevisiae/química
15.
Nat Commun ; 11(1): 1663, 2020 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-32245964

RESUMO

Massively parallel, quantitative measurements of biomolecular activity across sequence space can greatly expand our understanding of RNA sequence-function relationships. We report the development of an RNA-array assay to perform such measurements and its application to a model RNA: the core glmS ribozyme riboswitch, which performs a ligand-dependent self-cleavage reaction. We measure the cleavage rates for all possible single and double mutants of this ribozyme across a series of ligand concentrations, determining kcat and KM values for active variants. These systematic measurements suggest that evolutionary conservation in the consensus sequence is driven by maintenance of the cleavage rate. Analysis of double-mutant rates and associated mutational interactions produces a structural and functional mapping of the ribozyme sequence, revealing the catalytic consequences of specific tertiary interactions, and allowing us to infer structural rearrangements that permit certain sequence variants to maintain activity.


Assuntos
Proteínas de Bactérias/genética , Evolução Molecular , RNA Catalítico/genética , Riboswitch/genética , Proteínas de Bactérias/química , Proteínas de Bactérias/metabolismo , Sequência Consenso/genética , Cristalografia , Ensaios Enzimáticos , Sequenciamento de Nucleotídeos em Larga Escala , Ligantes , Mutação , Conformação de Ácido Nucleico , RNA Catalítico/química , RNA Catalítico/metabolismo , Análise de Sequência de RNA , Relação Estrutura-Atividade
16.
Plant J ; 103(1): 32-52, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-31981259

RESUMO

If two related plant species hybridize, their genomes may be combined and duplicated within a single nucleus, thereby forming an allotetraploid. How the emerging plant balances two co-evolved genomes is still a matter of ongoing research. Here, we focus on satellite DNA (satDNA), the fastest turn-over sequence class in eukaryotes, aiming to trace its emergence, amplification, and loss during plant speciation and allopolyploidization. As a model, we used Chenopodium quinoa Willd. (quinoa), an allopolyploid crop with 2n = 4x = 36 chromosomes. Quinoa originated by hybridization of an unknown female American Chenopodium diploid (AA genome) with an unknown male Old World diploid species (BB genome), dating back 3.3-6.3 million years. Applying short read clustering to quinoa (AABB), C. pallidicaule (AA), and C. suecicum (BB) whole genome shotgun sequences, we classified their repetitive fractions, and identified and characterized seven satDNA families, together with the 5S rDNA model repeat. We show unequal satDNA amplification (two families) and exclusive occurrence (four families) in the AA and BB diploids by read mapping as well as Southern, genomic, and fluorescent in situ hybridization. Whereas the satDNA distributions support C. suecicum as possible parental species, we were able to exclude C. pallidicaule as progenitor due to unique repeat profiles. Using quinoa long reads and scaffolds, we detected only limited evidence of intergenomic homogenization of satDNA after allopolyploidization, but were able to exclude dispersal of 5S rRNA genes between subgenomes. Our results exemplify the complex route of tandem repeat evolution through Chenopodium speciation and allopolyploidization, and may provide sequence targets for the identification of quinoa's progenitors.


Assuntos
Chenopodium quinoa/genética , DNA Satélite/genética , Genoma de Planta/genética , Tetraploidia , Cromossomos de Plantas/genética , Sequência Consenso/genética , Hibridização Genética/genética , Retroelementos/genética , Alinhamento de Sequência , Sequências de Repetição em Tandem/genética
17.
Artigo em Inglês | MEDLINE | ID: mdl-30307874

RESUMO

The de-novo genome assembly is a challenging computational problem for which several pipelines have been developed. The advent of long-read sequencing technology has resulted in a new set of algorithmic approaches for the assembly process. In this work, we identify that one of these new and fast long-read assembly techniques (using Minimap2 and Miniasm) can be modified for the short-read assembly process. This possibility motivated us to customize a long-read assembly approach for applications in a short-read assembly scenario. Here, we compare and contrast our proposed de-novo assembly pipeline (MiniSR) with three other recently developed programs for the assembly of bacterial and small eukaryotic genomes. We have documented two trade-offs: one between speed and accuracy and the other between contiguity and base-calling errors. Our proposed assembly pipeline shows a good balance in these trade-offs. The resulting pipeline is 6 and 2.2 times faster than the short-read assemblers Spades and SGA, respectively. MiniSR generates assemblies of superior N50 and NGA50 to SGA, although assemblies are less complete and accurate than those from Spades. A third tool, SOAPdenovo2, is as fast as our proposed pipeline but had poorer assembly quality.


Assuntos
Sequência Consenso/genética , Genômica/métodos , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Genoma Bacteriano/genética , Sequenciamento de Nucleotídeos em Larga Escala
18.
Vet Microbiol ; 239: 108451, 2019 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-31767095

RESUMO

The substantial genetic diversity exhibited by influenza A viruses of swine (IAV-S) represents the main challenge for the development of a broadly protective vaccine against this important pathogen. The consensus vaccine immunogen has proven an effective vaccinology approach to overcome the extraordinary genetic diversity of RNA viruses. In this project, we sought to determine if a consensus IAV-S hemagglutinin (HA) immunogen would elicit broadly protective immunity in pigs. To address this question, a consensus HA gene (designated H3-CON.1) was generated from a set of 1,112 H3 sequences of IAV-S recorded in GenBank from 2011 to 2015. The consensus HA gene and a HA gene of a naturally occurring H3N2 IAV-S strain (designated H3-TX98) were expressed using the baculovirus expression system and emulsified in an oil-in-water adjuvant to be used for vaccination. Pigs vaccinated with H3-CON.1 immunogen elicited broader levels of cross-reactive neutralizing antibodies and interferon gamma secreting cells than those vaccinated with H3-TX98 immunogen. After challenge infection with a fully infectious H3N2 IAV-S isolate, the H3-CON.1-vaccinated pigs shed significantly lower levels of virus in their nasal secretions than the H3-TX98-vaccinated pigs. Collectively, our data provide a proof-of-evidence that the consensus immunogen approach may be effectively employed to develop a broadly protective vaccine against IAV-S.


Assuntos
Genes Virais/imunologia , Glicoproteínas de Hemaglutininação de Vírus da Influenza/imunologia , Vacinas contra Influenza/imunologia , Infecções por Orthomyxoviridae , Doenças dos Suínos , Vacinação/veterinária , Animais , Anticorpos Antivirais/sangue , Sequência Consenso/genética , Sequência Consenso/imunologia , Glicoproteínas de Hemaglutininação de Vírus da Influenza/genética , Infecções por Orthomyxoviridae/imunologia , Infecções por Orthomyxoviridae/virologia , Suínos , Doenças dos Suínos/imunologia , Doenças dos Suínos/virologia , Eliminação de Partículas Virais/imunologia
19.
Mol Cell Proteomics ; 18(12): 2348-2358, 2019 12.
Artigo em Inglês | MEDLINE | ID: mdl-31604803

RESUMO

Low vaccine efficacy against seasonal influenza A virus (IAV) stems from the ability of the virus to evade existing immunity while maintaining fitness. Although most potent neutralizing antibodies bind antigenic sites on the globular head domain of the IAV envelope glycoprotein hemagglutinin (HA), the error-prone IAV polymerase enables rapid evolution of key antigenic sites, resulting in immune escape. Significantly, the appearance of new N-glycosylation consensus sequences (sequons, NXT/NXS, rarely NXC) on the HA globular domain occurs among the more prevalent mutations as an IAV strain undergoes antigenic drift. The appearance of new glycosylation shields underlying amino acid residues from antibody contact, tunes receptor specificity, and balances receptor avidity with virion escape, all of which help maintain viral propagation through seasonal mutations. The World Health Organization selects seasonal vaccine strains based on information from surveillance, laboratory, and clinical observations. Although the genetic sequences are known, mature glycosylated structures of circulating strains are not defined. In this review, we summarize mass spectrometric methods for quantifying site-specific glycosylation in IAV strains and compare the evolution of IAV glycosylation to that of human immunodeficiency virus. We argue that the determination of site-specific glycosylation of IAV glycoproteins would enable development of vaccines that take advantage of glycosylation-dependent mechanisms whereby virus glycoproteins are processed by antigen presenting cells.


Assuntos
Vírus da Influenza A/imunologia , Vírus da Influenza A/metabolismo , Vacinas contra Influenza/imunologia , Animais , Sequência Consenso/genética , Glicosilação , Humanos , Vírus da Influenza A/genética , Espectrometria de Massas , Mutação
20.
Proc Natl Acad Sci U S A ; 116(29): 14557-14562, 2019 07 16.
Artigo em Inglês | MEDLINE | ID: mdl-31262814

RESUMO

A symmetric origin for bacterial ferredoxins was first proposed over 50 y ago, yet, to date, no functional symmetric molecule has been constructed. It is hypothesized that extant proteins have drifted from their symmetric roots via gene duplication followed by mutations. Phylogenetic analyses of extant ferredoxins support the independent evolution of N- and C-terminal sequences, thereby allowing consensus-based design of symmetric 4Fe-4S molecules. All designs bind two [4Fe-4S] clusters and exhibit strongly reducing midpoint potentials ranging from -405 to -515 mV. One of these constructs efficiently shuttles electrons through a designed metabolic pathway in Escherichia coli These finding establish that ferredoxins consisting of a symmetric core can be used as a platform to design novel electron transfer carriers for in vivo applications. Outer-shell asymmetry increases sequence space without compromising electron transfer functionality.


Assuntos
Proteínas de Escherichia coli/genética , Escherichia coli/metabolismo , Ferredoxinas/genética , Engenharia Metabólica , Sequência Consenso/genética , Transporte de Elétrons/genética , Escherichia coli/genética , Proteínas de Escherichia coli/metabolismo , Evolução Molecular , Ferredoxinas/metabolismo , Duplicação Gênica , Redes e Vias Metabólicas/genética , Filogenia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...